European-language-classification | Challenge for startup.ml

 by   nieoh Jupyter Notebook Version: Current License: No License

kandi X-RAY | European-language-classification Summary

kandi X-RAY | European-language-classification Summary

European-language-classification is a Jupyter Notebook library. European-language-classification has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

Challenge for startup.ml. Find the challenges here. A writeup for the project can be found in the project writeup notebook. Classify 21 different European languages using the data given by the European Parliament Proceedings Parallel Corpus from 1996-2011. Scikit-learn is the main tool used here. The data is analyzed using n-grams, in particular, unigrams, bigrams and trigrams. We use a simple tfidf vectorizer combined with perceptron to create a classifier. Only the text from the month of January, over many years, is used in training and testing the data. The F-score was around 0.94 which was surprising. Then, the same algorithm was used to train on all the text from the month of January and tested against the following test set. The F-score in this case was around 0.89. Moving forward, I'd like to continue working on the project, optimizing the classifier with better preprocessing, more data and different algorithms.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              European-language-classification has a low active ecosystem.
              It has 0 star(s) with 1 fork(s). There are 2 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              European-language-classification has no issues reported. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of European-language-classification is current.

            kandi-Quality Quality

              European-language-classification has no bugs reported.

            kandi-Security Security

              European-language-classification has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              European-language-classification does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              European-language-classification releases are not available. You will need to build from source code and install.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of European-language-classification
            Get all kandi verified functions for this library.

            European-language-classification Key Features

            No Key Features are available at this moment for European-language-classification.

            European-language-classification Examples and Code Snippets

            No Code Snippets are available at this moment for European-language-classification.

            Community Discussions

            No Community Discussions are available at this moment for European-language-classification.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install European-language-classification

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/nieoh/European-language-classification.git

          • CLI

            gh repo clone nieoh/European-language-classification

          • sshUrl

            git@github.com:nieoh/European-language-classification.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link